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ABSTRACT 
Mouse and keyboard are the wonderful inventions of Human-Computer Interaction (HCI) 
technology. Currently, wireless mouse or a Bluetooth mouse still uses devices and is not free of 
devices completely since it uses a battery for power and a dongle to connect it to the PC. In the 
proposed virtual mouse and keyboard system, this limitation can be overcome by employing 
webcam or a built-in camera for capturing of hand gestures and coloured object. Method used for 
Hand gesture recognition is based on Convolutional Neural Networks (CNNs).The computer can be 
controlled virtually and can perform left click, right click, scrolling 
functions, volumecontrol, brightness control and other computer cursor function without the use of a 
physical mouse.A virtual keyboard is controlled by tracking a coloured object and it is based on 
image processing technique. The proposed system will mostly can use on the conference to present 
topics using projector and avoid COVID-19 spread by eliminating the human intervention and 
dependency of devices to control the computer.Main goal is to make the interaction between human 
and computer as natural as the interactionbetween humans. 
Keywords- Virtual Mouse, Hand Gesture Recognition, Virtual Keyboard, Object Tracking, 
CNNs, Image Processing. 
1. INTRODUCTION 
In previously used wired technology, a user was unable to freely move as they are 
connected with the computer system with the wire and movement is limited to the length of 
wire. With the development technologies in the areas of augmented reality and devices that use 
in daily life, these devices are becoming compact in the form of Bluetooth or 
wirelesstechnologies[1].Always hear about new technology that improves lifestyle and makes 
lives easier these days. With the massive influx and advancement of technologies, a computer 
system has become a very powerful machinewhich has been designed to make the human 
beings’ tasks easier. Due to which the HCI (Human Computer Interaction)has become an 
important part of our lives.While using a wireless or a Bluetooth mouse, some devices such as 
the mouse, the dongle to connect to the PC, and also, a battery to power the mouse to operate 
are used, but in this , the user uses his/her built-in camera or a webcam and uses his/her hand 
gestures to control the computer mouse operations. Humankind has been transformed by 
technological advancements. The invention of mouse and keyboards by the researchers and 
engineers has been a great progress, thereare still some situations where interaction with 
computer with the help of keyboard and mouse will not be enough.Mankind has made a great 
contribution.The main objective of the proposed system is to perform computer mouse cursor 
and keyboard functions using a web cam or a built-in camera in the computer instead of using a 
traditional mouse and keyboard. 
This proposes a virtual mouse and keyboard system that makes use of the hand gestures 
detection for performing mouse functions and a coloured object is tracked to perform keyboard 


functions, In the proposed system, the web camera captures and then processes the frames that 
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have been captured and then recognizes the various hand gestures and hand tip gestures and 
then performs the particular mouse function.This system typically performs all operations that 
could be performed by a traditional mouse pointer. Operations like left-click, right-click, double 
click ,multiple selection ,scrolling ,drag and drop. The main aim is to create a cost-free hand 
recognition software for laptops and PCs with external webcams. 


The virtual keyboard system is based oncoloured objects that will be used for tracking 
should be identified. A range system has been initialized to set the HSV min, max values, and 
stored for use while tracking and performing operations. This is a significant part of the 
systemas it may fail to detect objects and perform tracking if the range is not set correctly. An 
image of a keyboard will be projected on the screen. Camera on the laptop will be turned on to 
capture live video images of the user’s object movement . When the user points to or selects a 
particular letter on the keyboard, the computer will lock the colour of the user’s fingers within 
the particular area allocated for the key which is being pressed, and the particular letter will be 
displayed on the output screen. 


2. RELATED WORKS 


The Object tracking method has been used to track the colored objects that help to operate on this 
system using the laptop webcam. By using the Object tracking system, the mouse and its basic 
operations like mouse pointing, selection, and deselection using left-clickcanbe controlled. In a 
computer system, colors are represented in different formats like HSV (Hue Saturation Value) and 
BGR (Blue, Green, Red). With the BGR format, a pixel is represented by blue, green, and red 
parameters with blue being most significant and red being less significant. Each parameter of BGR 
usually having 0 — 255 values where 0 for all parameters represents black and 255 represents white, 
and the combinations of values for BGR from 0 to 255 creates various colors.Table 1 shows the 
accuracy result of existing system. 


Table 1: Accuracy Result of Existing System. 


Inputs Mouse Accuracy witha Accuracy with 
events plain a non- 
Background Plain 
(in %5) background 
| (in %) 
Two-colour Mouse 95 40 
object (open Movement 
Gesture) 
Closer two Left Button 389 41 
colour object Click 
(Closed 
Gesture) 
Keep close for Double Click 87 42 
5 seconds 
(Closed 
Gesture) 
Single color Right Button 96 830 
cap ( open Click 
Gesture) 
Swipe up Scrolling up Zo 40 
Down or down 
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3.PROPOSED METHODOLOGY 


3.1 Virtual Mouse 

The proposed virtual mouse system implemented using convolutional neural network 
(CNN) to predict hand gestures made by the users.This contain three modules. 

Image Capturing: 

Hand Gesture Recognition with Python is a system that can detect the gesture of hand in a 
real time video.Python OpenCV library can be used to capture gestures from computer’s internal 
camera or web cam. Hand tracking and segmentation are the primary steps for any hand gesture 
recognition system. With the help of hand gesture tried to control the mouse actions. 

Data set training: 

CNN is a very popular approach in deep learning in which multiple layers are robustly 
trained. CNN can be applied to construct a computational form that operates on unorganized image 
inputs and transforms them into the correct output categories for classification. fig.1 shows the 
architecture of the proposed hand gesture recognition model using CNN. 
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Fig 1: Architecture of the proposed hand gesture recognition model using CNN 
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The model provides a method to add new gestures and train them accordingly.The data set 
consists of 5 gesture classes.Each class contain 1500 images.Each gesture controlling different 
mouse functions. The CNN that that has been considered in this research to recognize hand gesture 
is composed of seven convolution layers, three max pooling layers, two fully connected layers and 
output layer. There one dropout performance in the network to prevent over-fitting . The first 
convolution layer has 60 different filters with the kernel size 3x3. The activation function used in 
this layer is Rectified Linear Unit (ReLU). ReLU was applied to introduce non-linearity and it has 
been proved that ReLU performs better than other activation functions such as tanh or sigmoid. As 
it is input layer, have to specify the input size. The stride is set to default. The input shape is 
32x32x1.This passes to first convolution layer and size become 30x30. Then passes again second 
convolution layer and size from 30x30 to 28x28 .This layer produces the feature maps and passes 
them to the next layer. Then the CNN has a max pooling layer with pool size 2x2 which takes the 
maximum value from a window of size 2x2. The spatial size of the representation is reduced 
progressively as the pooling layer takes only the maximum value and discards the rest. This layer 
helps the network to understand the images better because it only selects more important features. 
The next layer is another convolution layer and it has different filters with the kernel size 3x3 and 
default stride. Again, ReLU was used as the activation function in this layer. This layer is followed 
by another max pooling layer which has a pooling size 2x2.Also the next layer is convolution layer 
and in seventh convolution layer size become 6x6.Then followed by another max pool layer. In 
this layer, first dropout was added which randomly discards 25 percentage of the total neurons to 
prevent the model from over-fitting. Output from this layer is passed to the flatten layer. 

Output from the previous layers are received by the flattening layer and they are flattened to 
a vector from two-dimensional matrix. This layer allows the fully connected layers to process the 
data achieved till now. The next layer is first fully connected layer which has 256 nodes and ReLU 
was used as the activation function. The layer is followed by a dropout layer which excludes 25 
percentage of the neurons to prevent overfitting. The second fully connected layer again has 200 
nodes to receive the vector produced by fully connected layer and uses ReLU as activation layer. 
The layer is followed by a dropout layer to exclude 25 percentage of the neurons to prevent 
overfitting. The output layer has 5 nodes corresponding to each classes of the hand gestures. This 
layer uses SoftMax function as activation function which outputs a probabilistic value for each of 
the classes. 


Mouse Functions 
Operations are performed by capturing the movements in a frame. 
Mouse Cursor Movement : Keep the index finger and middle finger together and move. 
Left Click : Keep the index finger up and down. 
Right Click : Keep the middle finger up and down. 
Double Click : Keep the index finger and middle finger closer or Close using mouse 
movement . 
Scrolling :Keep all fingers closed and move up and down 
Drag and Drop:Keep all fingers open and close then move . 
Volume Control :Use ok hand gesture to move left and right. 
Brightness Control : Use ok hand gesture to move up and down. 
Stop mouse action : Using palm of the hand. 


3.2 Virtual Keyboard 

In the proposed system, the Object tracking method has been used to track the colored 
objects that help to operate on this system using the laptop webcam. By using the Object tracking 
system, the keyboard and its basic operations space,enter, backspace etc. canbe controlled. In a 
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computer system, colors are represented in different formats like HSV (Hue Saturation Value) and 
BGR (Blue, Green, Red). With the BGR format, a pixel is represented by blue, green, and red 
parameters with blue being most significant and red being less significant. And each parameter of 
BGR usually having 0 — 255 values where 0 for all parameters represents black and 255 represents 
white, and the combinations of values for BGR from 0 to 255 creates various colors. 

For example, a red pixel on your computer would have an R-value of 255, a B value of 0, 
and a G-value of 0. Your computer would interpret this as, “The pixel is 0 parts blue, Oparts green, 
and 255 parts red.” HSV also represents pixels by 3 parameters but uses Hue, Saturation, and Value 
as parameters. HSV makes use of hue, which is the shade or color. The saturation is the intensity of 
the color. A value of255 is the max intensity for saturation and 0 for saturation represents white. 
Saturation can also be known as how colorful a pixel is. How bright or dark a color is, is 
represented by its value. Since the HSV is used, it has been necessary to convert BGR to HSV 
which is done by OpenCV using the cv2.cvtColor( ) function and passing image and the flag as 
parameters that determines the image type of conversion to be done. Here using green colour object 
to track keyboard. 

The proposed system uses Computer Vision libraries and algorithms to determine the 
object, its movements, andact as the movement using Real-time tracking. But primary focus is on 
pointing the keyboard and different actions by hand tracking and output is written in to text 
file. Virtual keyboard contain 5 modules and fig2 shows the flow chart of keyboard. 


| Keyboard Layout ‘| 


Coloured Object 
Detectio 


Mask the coloured object 


Track the pressed key 
positions 


Single character 


Dispaly keyboard action 
in text file 


Fig:2 shows the flow chart of keyboard. 


Camera Settings 
In order to perform runtime operations, the device’s web-camera is used. To capture a 
video, need to create a VideoCapture object. Its argument can be either the device index or the 
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name of a video file. The device index is just the number to specify which camera. Normally one 
camera will be connected, so simply pass 0. You can select the second camera by passing | and so 
on. After that, you can capture frame-by-frame. 
Capturing Frames 

The infinite loop is used so that the web camera captures the frames in every instance and is 
open during the entire course of the program. After capturing the live stream frame by frame are 
converting each frame in BGR colour space(the default one) to HSV colour space. There are more 
than 150 colour-space conversion methods available in OpenCV. But will look into only two which 
are most widely used ones, BGR to Gray and BGR to HSV. In specifying the range, have specified 
the range of green colour. Whereas you can enter the range of any colour you wish. 
Masking Technique 

The mask is basically creating some specific region of the image following certain rules. 
Here are creating a mask that comprises of an object in green colour. After that, I have used a 
bitwise and on the input image and the threshold image so that only the green coloured objects are 
highlighted . Then display the frame, res, and mask on 3 separate windows using imshow function. 
Keyboard Data 

An image of a keyboard will be projected on the screen and Fig.3 shows the virtual 
keyboard layout. Next the, user is asked to wear the coloured gloves or stickers on his hands, and 
the camera on the laptop will be turned on to capture live video images of the user’s finger 
movements. When the user points to or selects a particular letter on the keyboard, the computer will 
lock the colour of the user’s fingers within the particular area allocated for the key which is being 
pressed, and the particular letter will be displayed on the output screen. Colour conversion from 
RGB to HSV colour codes, are used for this purpose. 
Display The Frame 

As imshow() is a function of pyautogul. it is required to call waitKey regularly, in order to 
process its event loop. The function waitKey() waits for key event for a “delay” (here, 5 
milliseconds). If you don’t call waitKey, HighGui cannot process windows events .like redraw, 
resizing, input event, etc. So just call it, even with a Ims delay. 
3. RESULTS AND DISCUSSION 


Replacement of the mouse with hand ,so that they can use mouse function for what they 
desire from anywhere concerning be in the frame of web camera. The use of this project in real- 
time is vast. The system performance is very wellin good lighting conditions and also this 
overcome the problem with the background that is gesture can recognize with any background. 
Nevertheless the system is little bit fast responsive as compared to the other systemwhich have 
been developed earlier as it does not require any training phase for gesture recognition and the 
training accuracy of the system is 96.37 wich shows in fig 3. and fig 4 shows the accuracy of 
mouse over epochs. 
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Fig:Training Result 
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Fig 4:Accuracy Over Epochs 


Different mouse functions are performed using five hand gestures.FigS shows the hand 
gestures that are used to perform mouse actions. 
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Fig 5: Different Hand Gestures Used For Virtual Mouse 
The overall recognition rate of the proposed virtual keyboard is 94.62 percentage.The results 
indicate that the recognition rate is related to both the accuracy of each typing position and the type 
of light source.Fig 6 shows the output of the keyboard. 


Fig:Output of Keyboard Shows in Notepad. 


CONCLUSION 

The main objective of the virtual mouse system is to control the mouse cursor functions 
by using the hand gestures instead of using a physical mouse and the virtual keyboard is controlled 
by tracking coloured object. The proposed system can be achieved by using a webcam or a built-in 
camera which detects the hand gestures processes frames to perform the particular mouse functions 
and detect coloured object and perform keyboard function accordingly ,then display to text file. 
From the results of the model, can come to a conclusion at the proposed virtual mouse and 
keyboard system has performed very well and has a greater accuracy compared to the existing 
models and also the model overcomes most of the limitations of existing systems. 

Since the proposed model has greater accuracy, the virtual mouse and keyboard can be 
used for real-world applications, and also, it can be used to reduce the spread of COVID-19, since 
the proposed mouse system can be used virtually using hand gestures and object without using the 
traditional device. Virtual mouse can perform right click, leftclick,doubleclick,scrolling and drag and 
drop. Virtual keyboard can configure according to needs.So it has vast applications such as virtual 
can use for conference or presentations and keyboard can use gaming.Also can use for 
handicapped persons if they can only move hands,so they can communicate through this proposed 
system. For the future work, swipe keypads which detect the gestures in air view can also be 
implemented. This would improve the results when the typing speed is quick. Also, this technique 
can be further improvised to be use on a smart TV. 
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